Sequence-to-Sequence ASR Optimization via Reinforcement Learning

نویسندگان

Andros Tjandra

Sakriani Sakti

Satoshi Nakamura

چکیده

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the sequence-to-sequence architecture, the model is trained to predict the grapheme of the current time-step given the input of speech signal and the ground-truth grapheme history of the previous time-steps. However, it remains unclear how well the model approximates real-world speech during inference. Thus, generating the whole transcription from scratch based on previous predictions is complicated and errors can propagate over time. Furthermore, the model is optimized to maximize the likelihood of training data instead of error rate evaluation metrics that actually quantify recognition quality. This paper presents an alternative strategy for training sequence-to-sequence ASR models by adopting the idea of reinforcement learning (RL). Unlike the standard training scheme with maximum likelihood estimation, our proposed approach utilizes the policy gradient algorithm. We can (1) sample the whole transcription based on the model’s prediction in the training process and (2) directly optimize the model with negative Levensthein distance as the reward. Experimental results demonstrate that we significantly improved the performance compared to a model trained only with maximum likelihood estimation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Operation Sequencing Optimization in CAPP Using Hybrid Teaching-Learning Based Optimization (HTLBO)

Computer-aided process planning (CAPP) is an essential component in linking computer-aided design (CAD) and computer-aided manufacturing (CAM). Operation sequencing in CAPP is an essential activity. Each sequence of production operations which is produced in a process plan cannot be the best possible sequence every time in a changing production environment. As the complexity of the product incr...

متن کامل

From Weighted Classification to Policy Search

This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a...

متن کامل

A Distributed Reinforcement Learning Approach for Solving Optimization Problems

Combinatorial optimization is the seeking for one or more optimal solutions in a well defined discrete problem space. The optimization methods are of great importance in practice, particularly in the engineering design process, the scientific experiments and the business decision-making. We are investigating in this paper a distributed reinforcement learning based approach for solving combinato...

متن کامل

Two meta-heuristic algorithms for parallel machines scheduling problem with past-sequence-dependent setup times and effects of deterioration and learning

This paper considers identical parallel machines scheduling problem with past-sequence-dependent setup times, deteriorating jobs and learning effects, in which the actual processing time of a job on each machine is given as a function of the processing times of the jobs already processed and its scheduled position on the corresponding machine. In addition, the setup time of a job on each machin...

متن کامل

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1710.10774 شماره

صفحات -

تاریخ انتشار 2017

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Operation Sequencing Optimization in CAPP Using Hybrid Teaching-Learning Based Optimization (HTLBO)

From Weighted Classification to Policy Search

A Distributed Reinforcement Learning Approach for Solving Optimization Problems

Two meta-heuristic algorithms for parallel machines scheduling problem with past-sequence-dependent setup times and effects of deterioration and learning

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

عنوان ژورنال:

اشتراک گذاری